Back to NotebooksBack
DownloadStability Test_v1
Linear Regression Stability Test Formula
The stability test uses linear regression to detect temporal trends in precipitation data:
Linear Model: $$Y_t = \beta_0 + \beta_1 \cdot t + \epsilon_t$$
Where:
- $Y_t$ = precipitation value at time $t$
- $\beta_0$ = intercept (initial value)
- $\beta_1$ = slope (rate of change over time)
- $t$ = time index (0, 1, 2, ..., n-1)
- $\epsilon_t$ = random error term
Hypothesis Testing:
- $H_0$: $\beta_1 = 0$ (no trend, data is stable)
- $H_1$: $\beta_1 \neq 0$ (significant trend exists, data is not stable)
Test Statistic: $$t = \frac{\beta_1}{SE(\beta_1)}$$
P-value Calculation: $$p\text{-value} = 2 \times P(T_{n-2} \geq |t|)$$
Where:
- $T_{n-2}$ follows a t-distribution with $(n-2)$ degrees of freedom
- $n$ = number of data points
- The factor of 2 accounts for the two-tailed test
Decision Rule:
- If $p\text{-value} < \alpha$ (0.05): Reject $H_0$ → Station FAILS (has significant trend)
- If $p\text{-value} \geq \alpha$ (0.05): Accept $H_0$ → Station PASSES (no significant trend)
Additional Metrics:
- $R^2$: Coefficient of determination (proportion of variance explained by time)
- Standard Error: $SE(\beta_1) = \sqrt{\frac{MSE}{\sum(t_i - \bar{t})^2}}$
Linear Regression Stability Test - Step by Step Process
This script performs a temporal stability test on precipitation data using linear regression analysis. Here's how it works:
Step 1: Data Preparation
- Load precipitation data from Excel file containing multiple station columns
- Extract numeric columns (station data) excluding the Date column
- Create a time index array (0, 1, 2, ..., n-1) representing temporal sequence
Step 2: Linear Regression Analysis
For each station (numeric column):
- Fit Linear Model: Apply
scipy.stats.linregress()to model precipitation vs. time - Extract Parameters:
- Slope (β₁): Rate of change in precipitation over time
- Intercept (β₀): Initial precipitation value
- R-squared: Proportion of variance explained by the temporal trend
- P-value: Statistical significance of the trend
- Standard Error: Uncertainty in the slope estimate
Step 3: Hypothesis Testing
- Significance Level: α = 0.05 (95% confidence)
- Decision Rule:
- If p-value < 0.05 → PASSED (significant trend detected, data is NOT stable)
- If p-value ≥ 0.05 → FAILED (no significant trend, data is stable)
Step 4: Visualization
Four subplots are generated:
- Slope Plot: Shows trend direction and magnitude (green = passed, red = failed)
- P-value Plot: Displays statistical significance with α threshold line
- R-squared Plot: Indicates strength of temporal relationship
- Summary Statistics: Overall test results and pass/fail counts
Step 5: Results Export
- Detailed results are saved to Excel file
- Output includes all regression parameters and test outcomes for each station
Note: In this context, "passing" means detecting a significant trend, which may indicate climate change or data instability.
Python
import pandas as pd
import numpy as np
from scipy import statsPython
file_source = r"\Data\grids_precipitation_data.xlsx"Python
# Load the data
df = pd.read_excel(file_source)
# Get numeric columns (excluding Date column)
numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()Python
import matplotlib.pyplot as plt
# Define significance level
alpha = 0.05
# Perform linear regression for each station
regression_results = {}
time_index = np.arange(len(df))
for col in numeric_cols:
# Perform linear regression
slope, intercept, r_value, p_value, std_err = stats.linregress(time_index, df[col])
regression_results[col] = {
'slope': slope,
'intercept': intercept,
'r_squared': r_value**2,
'p_value': p_value,
'std_err': std_err
}
# Convert regression_results dictionary to DataFrame
results_df = pd.DataFrame.from_dict(regression_results, orient='index')
# Add a column to indicate if the trend is statistically significant
results_df['significant'] = results_df['p_value'] < alpha
results_df['passed_test'] = results_df['significant']
# Create visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
# Plot 1: Slope values with significance
ax1 = axes[0, 0]
colors = ['green' if sig else 'red' for sig in results_df['passed_test']]
ax1.bar(results_df.index.astype(str), results_df['slope'], color=colors, alpha=0.7)
ax1.axhline(y=0, color='black', linestyle='--', linewidth=0.5)
ax1.set_xlabel('Station')
ax1.set_ylabel('Slope (trend)')
ax1.set_title('Trend Slope by Station\n(Green=Passed, Red=Failed)')
ax1.tick_params(axis='x', rotation=90, labelsize=6)
# Plot 2: P-values
ax2 = axes[0, 1]
ax2.bar(results_df.index.astype(str), results_df['p_value'], color=colors, alpha=0.7)
ax2.axhline(y=alpha, color='blue', linestyle='--', linewidth=2, label=f'α={alpha}')
ax2.set_xlabel('Station')
ax2.set_ylabel('P-value')
ax2.set_title('P-values by Station')
ax2.legend()
ax2.tick_params(axis='x', rotation=90, labelsize=6)
# Plot 3: R-squared values
ax3 = axes[1, 0]
ax3.bar(results_df.index.astype(str), results_df['r_squared'], color=colors, alpha=0.7)
ax3.set_xlabel('Station')
ax3.set_ylabel('R-squared')
ax3.set_title('R-squared by Station')
ax3.tick_params(axis='x', rotation=90, labelsize=6)
# Plot 4: Summary statistics
ax4 = axes[1, 1]
ax4.axis('off')
passed = results_df['passed_test'].sum()
failed = len(results_df) - passed
summary_text = f"""
Test Summary (α = {alpha}):
━━━━━━━━━━━━━━━━━━━━━━
Stations Passed: {passed} ({passed/len(results_df)*100:.1f}%)
Stations Failed: {failed} ({failed/len(results_df)*100:.1f}%)
Total Stations: {len(results_df)}
Criteria: p-value < {alpha}
"""
ax4.text(0.1, 0.5, summary_text, fontsize=14, family='monospace',
verticalalignment='center')
plt.tight_layout()
plt.show()
# Print detailed results
print("\n" + "="*70)
print("STATION TEST RESULTS")
print("="*70)
print(results_df[['slope', 'p_value', 'r_squared', 'passed_test']].to_string())
print("\n" + "="*70)
print(f"Stations that PASSED (significant trend): {passed}/{len(results_df)}")
print(f"Stations that FAILED (no significant trend): {failed}/{len(results_df)}")
print("="*70)Python
# Save the results to Excel
output_file = r"\Data\linear_regression_results.xlsx"
results_df.to_excel(output_file, index=True)
print(f"Results saved to: {output_file}")